### Install and oad packages if necessary
pacman::p_load(tidyverse, magrittr,
tidymodels
)
Welcome all to this introduction to machine learning (ML). In this session we cover the following topics 1. Generalizating and valididating from ML models. 2. The Bias-Variance Trade-Off 3. Out-of-sample testing and cross-validation workflows 4. Implementing Ml workflows with the tidymodels ecosystem.
Remeber, the steps in the ML workflow are:
Visualizing and exploring data
Validating models
Communicating insights
While step 1-3 is mainly covered by the general tidyverse packages such as dplyr and ggplot2, step 7 can be done using for instance rmarkdown (like me here) or developing an intractive shiny application. We will touch upon that, but the main focus here lies in the steps 5-6, the core of ML work.
These steps are mainly covered by the packages to be found in the tidymodels ecosystem, which take care of sampling, fitting, tuning, and evaluating models and data.
Lets get started!
We will load a standard dataset from mlbench, the BostonHousing dataset. It comes as a dataframe with 506 observations on 14 features, the last one medv being the outcome:
crim per capita crime rate by townzn proportion of residential land zoned for lots over 25,000 sq.ftindus proportion of non-retail business acres per townchas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) (deselected in this case)nox nitric oxides concentration (parts per 110 million)rm average number of rooms per dwellingage proportion of owner-occupied units built prior to 1940dis weighted distances to five Boston employment centresrad index of accessibility to radial highwaystax full-value property-tax rate per USD 10,000ptratio pupil-teacher ratio by townb 1000(B - 0.63)^2 where B is the proportion of blacks by townlstat lower status of the populationmedv median value of owner-occupied homes in USD 1000’s (our outcome to predict)Source: Harrison, D. and Rubinfeld, D.L. “Hedonic prices and the demand for clean air”, J. Environ. Economics & Management, vol.5, 81-102, 1978.
These data have been taken from the UCI Repository Of Machine Learning Databases
library(mlbench) # Library including many ML benchmark datasets
data(BostonHousing)
data <- BostonHousing %>% as_tibble() %>% select(-chas)
rm(BostonHousing)
data %>%
glimpse()
## Rows: 506
## Columns: 13
## $ crim <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829…
## $ zn <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, …
## $ indus <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7…
## $ nox <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524…
## $ rm <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631…
## $ age <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, …
## $ dis <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.950…
## $ rad <dbl> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4…
## $ tax <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 3…
## $ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 1…
## $ b <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.9…
## $ lstat <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.1…
## $ medv <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 1…
In this exercise, we will predict medv (median value of owner-occupied homes in USD). Such a model would in teh real world be used to predict developments in housing prices, eg. to inform policy makers or potential investors. In case I have only one target outcome, I prefer to name it as y. This simple naming convention helps to re-use code across datasets.
data %<>%
rename(y = medv) %>%
select(y, everything())
Lets take a look at some descriptives. Here, I like to use the skimr package, which does that bvery neath.
data %>% skimr::skim()
| Name | Piped data |
| Number of rows | 506 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| numeric | 13 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| y | 0 | 1 | 22.53 | 9.20 | 5.00 | 17.02 | 21.20 | 25.00 | 50.00 | ▂▇▅▁▁ |
| crim | 0 | 1 | 3.61 | 8.60 | 0.01 | 0.08 | 0.26 | 3.68 | 88.98 | ▇▁▁▁▁ |
| zn | 0 | 1 | 11.36 | 23.32 | 0.00 | 0.00 | 0.00 | 12.50 | 100.00 | ▇▁▁▁▁ |
| indus | 0 | 1 | 11.14 | 6.86 | 0.46 | 5.19 | 9.69 | 18.10 | 27.74 | ▇▆▁▇▁ |
| nox | 0 | 1 | 0.55 | 0.12 | 0.38 | 0.45 | 0.54 | 0.62 | 0.87 | ▇▇▆▅▁ |
| rm | 0 | 1 | 6.28 | 0.70 | 3.56 | 5.89 | 6.21 | 6.62 | 8.78 | ▁▂▇▂▁ |
| age | 0 | 1 | 68.57 | 28.15 | 2.90 | 45.02 | 77.50 | 94.07 | 100.00 | ▂▂▂▃▇ |
| dis | 0 | 1 | 3.80 | 2.11 | 1.13 | 2.10 | 3.21 | 5.19 | 12.13 | ▇▅▂▁▁ |
| rad | 0 | 1 | 9.55 | 8.71 | 1.00 | 4.00 | 5.00 | 24.00 | 24.00 | ▇▂▁▁▃ |
| tax | 0 | 1 | 408.24 | 168.54 | 187.00 | 279.00 | 330.00 | 666.00 | 711.00 | ▇▇▃▁▇ |
| ptratio | 0 | 1 | 18.46 | 2.16 | 12.60 | 17.40 | 19.05 | 20.20 | 22.00 | ▁▃▅▅▇ |
| b | 0 | 1 | 356.67 | 91.29 | 0.32 | 375.38 | 391.44 | 396.23 | 396.90 | ▁▁▁▁▇ |
| lstat | 0 | 1 | 12.65 | 7.14 | 1.73 | 6.95 | 11.36 | 16.96 | 37.97 | ▇▇▅▂▁ |
Ok, time for some visual exploration. Here I will introduce the GGally package, a wrapper for ggplot2 which has some functions for very nice visual summaries in matrix form.
First, lets look at a classical correlation matrix.
data %>%
GGally::ggcorr(label = TRUE,
label_size = 3,
label_round = 2,
label_alpha = TRUE)
Even cooler, the ggpairs function creates you a scatterplot matrix plus all variable distributions and correlations.
data %>%
GGally::ggpairs(aes(alpha = 0.3),
ggtheme = theme_gray())
First, we again split ou data in training and test sample.
data_split <- initial_split(data, prop = 0.75, strata = y)
data_train <- data_split %>% training()
data_test <- data_split %>% testing()
Here, we do only some simple transformations. * We normalizizing all numeric data by centering (subtracting the mean) and scaling (divide by standard deviation). * We remove features with near-zero-variance, which would not help the model a lot. * We here also add a simple way to already in the preprocessing deal with missing data. recipes has inbuild missing value inputation algorithms, such as ‘k-nearest-neighbors’.
data_recipe <- data_train %>%
recipe(y ~.) %>%
step_center(all_numeric(), -all_outcomes()) %>% # Centers all numeric variables to mean = 0
step_scale(all_numeric(), -all_outcomes()) %>% # scales all numeric variables to sd = 1
step_nzv(all_predictors()) %>% # Removed predictors with zero variance
step_knnimpute(all_predictors()) %>% # knn inputation of missing values
prep()
data_recipe
## Data Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 12
##
## Training data contained 381 data points and no missing data.
##
## Operations:
##
## Centering for crim, zn, indus, nox, rm, age, dis, rad, ... [trained]
## Scaling for crim, zn, indus, nox, rm, age, dis, rad, ... [trained]
## Sparse, unbalanced variable filter removed no terms [trained]
## K-nearest neighbor imputation for zn, indus, nox, rm, age, dis, rad, ... [trained]
model_lm <- linear_reg(mode = 'regression') %>%
set_engine('lm')
model_el <-linear_reg(mode = 'regression',
penalty = tune(),
mixture = tune()) %>%
set_engine("glmnet")
model_rf <- rand_forest(mode = 'regression',
trees = 100,
mtry = tune(),
min_n = tune()
) %>%
set_engine('ranger',
importance = 'impurity')
workflow_general <- workflow() %>%
add_recipe(data_recipe)
workflow_lm <- workflow_general %>%
add_model(model_lm)
workflow_el <- workflow_general %>%
add_model(model_el)
workflow_rf <- workflow_general %>%
add_model(model_rf)
This time, we will use a bootstrap sampling strategy, where we will draw a number of n randomly selected observations from the sample, and repeat this process 5 times. That means that our bootstrapped samples have the same size as the original one. This is a good resampling strategy in case the initial number of observations is low.
data_resample <- bootstraps(data_train,
strata = y,
times = 5)
data_resample %>% glimpse()
## Rows: 5
## Columns: 2
## $ splits <named list> [<rsplit[381 x 138 x 381 x 13]>, <rsplit[381 x 149 x 38…
## $ id <chr> "Bootstrap1", "Bootstrap2", "Bootstrap3", "Bootstrap4", "Boots…
tune_el <-
tune_grid(
workflow_el,
resamples = data_resample,
grid = 10
)
tune_el %>% autoplot()
best_param_el <- tune_el %>% select_best(metric = 'rmse')
best_param_el
tune_el %>% show_best(metric = 'rmse', n = 1)
tune_rf <-
tune_grid(
workflow_rf,
resamples = data_resample,
grid = 10
)
tune_rf %>% autoplot()
best_param_rf <- tune_rf %>% select_best(metric = 'rmse')
best_param_rf
tune_rf %>% show_best(metric = 'rmse', n = 1)
Alright, now we can fit the final models. Therefore, we have to first upate the formerly created workflows, where we fill the tune() placeholders with the by now determined best performing hyperparameter setup.
workflow_final_el <- workflow_el %>%
finalize_workflow(parameters = best_param_el)
workflow_final_rf <- workflow_rf %>%
finalize_workflow(parameters = best_param_rf)
fit_lm <- workflow_lm %>%
fit(data_train)
fit_el <- workflow_final_el %>%
fit(data_train)
fit_rf <- workflow_final_rf %>%
fit(data_train)
pred_collected <- tibble(
truth = data_train %>% pull(y),
base = mean(truth),
lm = fit_lm %>% predict(new_data = data_train) %>% pull(.pred),
el = fit_el %>% predict(new_data = data_train) %>% pull(.pred),
rf = fit_rf %>% predict(new_data = data_train) %>% pull(.pred),
) %>%
pivot_longer(cols = -truth,
names_to = 'model',
values_to = '.pred')
pred_collected %>% head()
pred_collected %>%
group_by(model) %>%
rmse(truth = truth, estimate = .pred) %>%
select(model, .estimate) %>%
arrange(.estimate)
pred_collected %>%
ggplot(aes(x = truth, y = .pred, color = model)) +
geom_abline(lty = 2, color = "gray80", size = 1.5) +
geom_point(alpha = 0.5) +
labs(
x = "Truth",
y = "Predicted price",
color = "Type of model"
)
So, now we are almost there. Since we know we will use the random forest, we only have to predict on our test sample and see how we fair…
fit_last_rf <- workflow_final_rf %>% last_fit(split = data_split)
fit_last_rf %>% collect_metrics()
fit_last_rf %>%
pluck(".workflow", 1) %>%
pull_workflow_fit() %>%
vip::vip(num_features = 10)
fit_el %>%
pull_workflow_fit() %>%
vip::vip(num_features = 10)
Customer churn refers to the situation when a customer ends their relationship with a company, and it’s a costly problem. Customers are the fuel that powers a business. Loss of customers impacts sales. Further, it’s much more difficult and costly to gain new customers than it is to retain existing customers. As a result, organizations need to focus on reducing customer churn.
The good news is that machine learning can help. For many businesses that offer subscription based services, it’s critical to both predict customer churn and explain what features relate to customer churn.
We now dive into the IBM Watson Telco Dataset. According to IBM, the business challenge is.
A telecommunications company [Telco] is concerned about the number of customers leaving their landline business for cable competitors. They need to understand who is leaving. Imagine that you’re an analyst at this company and you have to find out who is leaving and why.
The dataset includes information about:
Churndata <- readRDS(url("https://www.dropbox.com/s/zlqt05ugivl627a/telco_churn.rds?dl=1")) # notice that for readRDS i have to wrap the adress in url()
data %>%
glimpse()
## Rows: 7,043
## Columns: 21
## $ customerID <chr> "7590-VHVEG", "5575-GNVDE", "3668-QPYBK", "7795-CFOC…
## $ gender <chr> "Female", "Male", "Male", "Male", "Female", "Female"…
## $ SeniorCitizen <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Partner <chr> "Yes", "No", "No", "No", "No", "No", "No", "No", "Ye…
## $ Dependents <chr> "No", "No", "No", "No", "No", "No", "Yes", "No", "No…
## $ tenure <int> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49, …
## $ PhoneService <chr> "No", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No",…
## $ MultipleLines <chr> "No phone service", "No", "No", "No phone service", …
## $ InternetService <chr> "DSL", "DSL", "DSL", "DSL", "Fiber optic", "Fiber op…
## $ OnlineSecurity <chr> "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", …
## $ OnlineBackup <chr> "Yes", "No", "Yes", "No", "No", "No", "Yes", "No", "…
## $ DeviceProtection <chr> "No", "Yes", "No", "Yes", "No", "Yes", "No", "No", "…
## $ TechSupport <chr> "No", "No", "No", "Yes", "No", "No", "No", "No", "Ye…
## $ StreamingTV <chr> "No", "No", "No", "No", "No", "Yes", "Yes", "No", "Y…
## $ StreamingMovies <chr> "No", "No", "No", "No", "No", "Yes", "No", "No", "Ye…
## $ Contract <chr> "Month-to-month", "One year", "Month-to-month", "One…
## $ PaperlessBilling <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "No",…
## $ PaymentMethod <chr> "Electronic check", "Mailed check", "Mailed check", …
## $ MonthlyCharges <dbl> 29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 29.…
## $ TotalCharges <dbl> 29.85, 1889.50, 108.15, 1840.75, 151.65, 820.50, 194…
## $ Churn <chr> "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "…
data %<>%
rename(y = Churn) %>%
select(y, everything(), -customerID)
data %>% skimr::skim()
| Name | Piped data |
| Number of rows | 7043 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| character | 16 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| y | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| Partner | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| Dependents | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| PhoneService | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| MultipleLines | 0 | 1 | 2 | 16 | 0 | 3 | 0 |
| InternetService | 0 | 1 | 2 | 11 | 0 | 3 | 0 |
| OnlineSecurity | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| OnlineBackup | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| DeviceProtection | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| TechSupport | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| StreamingTV | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| StreamingMovies | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| Contract | 0 | 1 | 8 | 14 | 0 | 3 | 0 |
| PaperlessBilling | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| PaymentMethod | 0 | 1 | 12 | 25 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| SeniorCitizen | 0 | 1 | 0.16 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| tenure | 0 | 1 | 32.37 | 24.56 | 0.00 | 9.00 | 29.00 | 55.00 | 72.00 | ▇▃▃▃▆ |
| MonthlyCharges | 0 | 1 | 64.76 | 30.09 | 18.25 | 35.50 | 70.35 | 89.85 | 118.75 | ▇▅▆▇▅ |
| TotalCharges | 11 | 1 | 2283.30 | 2266.77 | 18.80 | 401.45 | 1397.47 | 3794.74 | 8684.80 | ▇▂▂▂▁ |
Next, lets have a first visual inspections. Many models in our prediction exercise to follow require the conditional distribution of the features to be different for the outcomes states to be predicted. So, lets take a look. Here, ggplot2 plus the ggridges package is my favorite. It is particularly helpfull when dealing with many variables, where you want to see differences in their conditional distribution with respect to an outcome of interest.
data %>%
gather(variable, value, -y) %>% # Note: At one point do pivot_longer instead
ggplot(aes(y = as.factor(variable),
fill = as.factor(y),
x = percent_rank(value)) ) +
ggridges::geom_density_ridges(alpha = 0.75)
data_split <- initial_split(data, prop = 0.75, strata = y)
data_train <- data_split %>% training()
data_test <- data_split %>% testing()
Here, I do the following preprocessing:
TotalChargesdata_recipe <- data_train %>%
recipe(y ~.) %>%
#step_discretize(tenure, options = list(cuts = 3)) %>%
step_log(TotalCharges) %>%
step_center(all_numeric(), -all_outcomes()) %>%
step_scale(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_knnimpute(all_predictors()) %>% # knn inputation of missing values
prep()
model_lg <- logistic_reg(mode = 'classification') %>%
set_engine('glm', family = binomial)
model_dt <- decision_tree(mode = 'classification',
cost_complexity = tune(),
tree_depth = tune(),
min_n = tune()
) %>%
set_engine('rpart')
model_xg <- boost_tree(mode = 'classification',
trees = 100,
mtry = tune(),
min_n = tune(),
tree_depth = tune(),
learn_rate = tune()
) %>%
set_engine("xgboost")
workflow_general <- workflow() %>%
add_recipe(data_recipe)
workflow_lg <- workflow_general %>%
add_model(model_lg)
workflow_dt <- workflow_general %>%
add_model(model_dt)
workflow_xg <- workflow_general %>%
add_model(model_xg)
data_resample <- data_train %>%
vfold_cv(strata = y,
v = 3,
repeats = 3)
tune_dt <-
tune_grid(
workflow_dt,
resamples = data_resample,
grid = 10
)
tune_dt %>% autoplot()
best_param_dt <- tune_dt %>% select_best(metric = 'roc_auc')
best_param_dt
tune_dt %>% show_best(metric = 'roc_auc', n = 1)
tune_xg <-
tune_grid(
workflow_xg,
resamples = data_resample,
grid = 10
)
tune_xg %>% autoplot()
best_param_xg <- tune_xg %>% select_best(metric = 'roc_auc')
best_param_xg
tune_xg %>% show_best(metric = 'roc_auc', n = 1)
workflow_final_dt <- workflow_dt %>%
finalize_workflow(parameters = best_param_dt)
workflow_final_xg <- workflow_xg %>%
finalize_workflow(parameters = best_param_xg)
fit_lg <- workflow_lg %>%
fit(data_train)
fit_dt <- workflow_final_dt %>%
fit(data_train)
fit_xg <- workflow_final_xg %>%
fit(data_train)
pred_collected <- tibble(
truth = data_train %>% pull(y) %>% as.factor(),
#base = mean(truth),
lg = fit_lg %>% predict(new_data = data_train) %>% pull(.pred_class),
dt = fit_dt %>% predict(new_data = data_train) %>% pull(.pred_class),
xg = fit_xg %>% predict(new_data = data_train) %>% pull(.pred_class),
) %>%
pivot_longer(cols = -truth,
names_to = 'model',
values_to = '.pred')
pred_collected %>% head()
pred_collected %>%
group_by(model) %>%
accuracy(truth = truth, estimate = .pred) %>%
select(model, .estimate) %>%
arrange(desc(.estimate))
pred_collected %>%
group_by(model) %>%
bal_accuracy(truth = truth, estimate = .pred) %>%
select(model, .estimate) %>%
arrange(desc(.estimate))
Surprisingly, here the less complex model seems to hve the edge!
So, now we are almost there. Since we know we will use the random forest, we only have to predict on our test sample and see how we fair…
fit_last_dt <- workflow_final_dt %>% last_fit(split = data_split)
fit_last_dt %>% collect_metrics()
fit_last_dt %>%
pluck(".workflow", 1) %>%
pull_workflow_fit() %>%
vip::vip(num_features = 10)
fit_xg %>%
pull_workflow_fit() %>%
vip::vip(num_features = 10)
tidymodels: Tidy statistical and predictive modeling ecosystemsessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/C/C/C/C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mlbench_2.1-1 yardstick_0.0.6 workflows_0.1.1 tune_0.1.0
## [5] rsample_0.0.6 recipes_0.1.10 parsnip_0.1.0 infer_0.5.1
## [9] dials_0.0.6 scales_1.1.0 broom_0.5.6 tidymodels_0.1.0
## [13] magrittr_1.5 forcats_0.5.0 stringr_1.4.0 dplyr_0.8.5
## [17] purrr_0.3.4 readr_1.3.1 tidyr_1.0.2 tibble_3.0.1
## [21] ggplot2_3.3.0 tidyverse_1.3.0 pacman_0.5.1 knitr_1.28
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.1.6 tidytext_0.2.4
## [4] plyr_1.8.6 igraph_1.2.5 repr_1.1.0
## [7] splines_3.6.3 crosstalk_1.1.0.1 listenv_0.8.0
## [10] SnowballC_0.7.0 rstantools_2.0.0 inline_0.3.15
## [13] digest_0.6.25 foreach_1.5.0 htmltools_0.4.0
## [16] rsconnect_0.8.16 fansi_0.4.1 globals_0.12.5
## [19] modelr_0.1.6 gower_0.2.1 matrixStats_0.56.0
## [22] xts_0.12-0 hardhat_0.1.2 prettyunits_1.1.1
## [25] colorspace_1.4-1 vip_0.2.2 skimr_2.1.1
## [28] rvest_0.3.5 haven_2.2.0 xfun_0.13
## [31] callr_3.4.3 crayon_1.3.4 jsonlite_1.6.1
## [34] lme4_1.1-23 survival_3.1-12 zoo_1.8-7
## [37] iterators_1.0.12 glue_1.4.0 gtable_0.3.0
## [40] ipred_0.9-9 pkgbuild_1.0.6 rstan_2.19.3
## [43] shape_1.4.4 GGally_1.5.0 DBI_1.1.0
## [46] miniUI_0.1.1.1 Rcpp_1.0.4.6 xtable_1.8-4
## [49] GPfit_1.0-8 StanHeaders_2.21.0-1 stats4_3.6.3
## [52] lava_1.6.7 prodlim_2019.11.13 DT_0.13
## [55] glmnet_3.0-2 htmlwidgets_1.5.1 httr_1.4.1
## [58] threejs_0.3.3 RColorBrewer_1.1-2 ellipsis_0.3.0
## [61] farver_2.0.3 reshape_0.8.8 loo_2.2.0
## [64] pkgconfig_2.0.3 nnet_7.3-13 dbplyr_1.4.3
## [67] utf8_1.1.4 labeling_0.3 tidyselect_1.0.0
## [70] rlang_0.4.5 DiceDesign_1.8-1 reshape2_1.4.4
## [73] later_1.0.0 munsell_0.5.0 cellranger_1.1.0
## [76] tools_3.6.3 xgboost_1.0.0.2 cli_2.0.2
## [79] generics_0.0.2 ranger_0.12.1 ggridges_0.5.2
## [82] evaluate_0.14 fastmap_1.0.1 yaml_2.2.1
## [85] processx_3.4.2 fs_1.4.1 future_1.17.0
## [88] nlme_3.1-147 mime_0.9 rstanarm_2.19.3
## [91] xml2_1.3.2 tokenizers_0.2.1 compiler_3.6.3
## [94] bayesplot_1.7.1 shinythemes_1.1.2 rstudioapi_0.11
## [97] reprex_0.3.0 tidyposterior_0.0.2 statmod_1.4.34
## [100] lhs_1.0.2 stringi_1.4.6 highr_0.8
## [103] ps_1.3.2 lattice_0.20-41 Matrix_1.2-18
## [106] nloptr_1.2.2.1 markdown_1.1 shinyjs_1.1
## [109] vctrs_0.2.4 pillar_1.4.3 lifecycle_0.2.0
## [112] furrr_0.1.0 data.table_1.12.8 httpuv_1.5.2
## [115] R6_2.4.1 promises_1.1.0 gridExtra_2.3
## [118] janeaustenr_0.1.5 codetools_0.2-16 boot_1.3-24
## [121] colourpicker_1.0 MASS_7.3-51.5 gtools_3.8.2
## [124] assertthat_0.2.1 withr_2.2.0 shinystan_2.5.0
## [127] parallel_3.6.3 hms_0.5.3 grid_3.6.3
## [130] rpart_4.1-15 timeDate_3043.102 minqa_1.2.4
## [133] class_7.3-16 rmarkdown_2.1 pROC_1.16.2
## [136] tidypredict_0.4.5 shiny_1.4.0.2 lubridate_1.7.8
## [139] base64enc_0.1-3 dygraphs_1.1.1.6